Extracting Bilingual Persian Italian Lexicon from Comparable Corpora Using Different Types of Seed Dictionaries
نویسندگان
چکیده
Ebrahim Ansari ([email protected]) et al. 2017. Extracting bilingual per-sian italian lexicon from comparable corpora using different types of seed dictionaries. In " Applications of Comparable Corpora " edited book Berlin Linguistic Press (ed.). Bilingual dictionaries are very important in various fields of natural language processing. In recent years, research on extracting new bilingual lexicons from non-parallel (comparable) corpora have been proposed. Almost all use a small existing dictionary or other resource to make an initial list called the " seed dictionary ". In this paper we discuss the use of different types of dictionaries as the initial starting list for creating a bilingual Persian-Italian lexicon from a comparable corpus. Our experiments apply state-of-the-art techniques on three different seed dictionaries ; an existing dictionary, a dictionary created with pivot-based schema, and a dictionary extracted from a small Persian-Italian parallel text. The interesting challenge of our approach is to find a way to combine different dictionaries together in order to produce a better and more accurate lexicon. In order to combine seed dictionaries, we propose two different combination models and examine the effect of our novel combination models on various comparable corpora that have differing degrees of comparability. We conclude with a proposal for a new weighting system to improve the extracted lexicon. The experimental results produced by our implementation show the efficiency of our proposed models.
منابع مشابه
Adapted Seed Lexicon and Combined Bidirectional Similarity Measures for Translation Equivalent Extraction from Comparable Corpora
An improved method for extracting translation equivalents from bilingual comparable corpora according to contextual similarity was developed. This method has two main features. First, a seed bilingual lexiconwhich is used to bridge contexts in different languagesis adapted to the corpora from which translation equivalents are to be extracted. Second, the contextual similarity is evaluated by ...
متن کاملUtilizing Contextually Relevant Terms in Bilingual Lexicon Extraction
This paper demonstrates one efficient technique in extracting bilingual word pairs from non-parallel but comparable corpora. Instead of using the common approach of taking high frequency words to build up the initial bilingual lexicon, we show contextually relevant terms that co-occur with cognate pairs can be efficiently utilized to build a bilingual dictionary. The result shows that our model...
متن کاملA Combination of Models for Bilingual Lexicon Extraction from Comparable Corpora
In this paper we present a method to extract bilingual terminologies from comparable non-aligned corpora, by using multiple linguistic knowledge sources, such as: non-parallel corpora, bilingual thesauri, a preliminary bilingual dictionary, etc... We focus on two core technologies: bilingual lexicon extraction from comparable corpora and expansion through thesauri categories based on different ...
متن کاملAssisting Translators in Indirect Lexical Transfer
We present the design and evaluation of a translator’s amenuensis that uses comparable corpora to propose and rank nonliteral solutions to the translation of expressions from the general lexicon. Using distributional similarity and bilingual dictionaries, the method outperforms established techniques for extracting translation equivalents from parallel corpora. The interface to the system is av...
متن کاملAutomatic Generation of Bilingual Dictionaries Using Intermediary Languages and Comparable Corpora
This paper outlines a strategy to build new bilingual dictionaries from existing resources. The method is based on two main tasks: first, a new set of bilingual correspondences is generated from two available bilingual dictionaries. Second, the generated correspondences are validated by making use of a bilingual lexicon automatically extracted from non-parallel, and comparable corpora. The qual...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1701.08340 شماره
صفحات -
تاریخ انتشار 2017